Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AArch64: Implement arraytranslateTRTO255 #7499

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

knn-k
Copy link
Contributor

@knn-k knn-k commented Oct 24, 2024

This commit implements arraytranslateTRTO255 for AArch64.

@knn-k
Copy link
Contributor Author

knn-k commented Oct 24, 2024

I will add support for other arraytranslateTRTO and arraytranslateTROT operations in separate PRs later.

@knn-k
Copy link
Contributor Author

knn-k commented Oct 28, 2024

Fixed a variable name.

TR::Register *inputReg = cg->gprClobberEvaluate(node->getChild(0));
TR::Register *outputReg = cg->gprClobberEvaluate(node->getChild(1));
TR::Register *inputLenReg = cg->gprClobberEvaluate(node->getChild(4));
TR::Register *outputLenReg = cg->allocateRegister();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Children 2,3,5 still need to be evaluated even if they are "dummy".

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Arraytranslate evaluators of p and x platforms leave those dummy children unevaluated.

  • p:
    TR::Register *inputReg = cg->gprClobberEvaluate(node->getChild(0));
    TR::Register *outputReg = cg->gprClobberEvaluate(node->getChild(1));
    TR::Register *stopCharReg = arraytranslateTRTO255 ? NULL : cg->gprClobberEvaluate(node->getChild(3));
    TR::Register *inputLenReg = cg->gprClobberEvaluate(node->getChild(4));
  • x:
    bool stopUsingCopyReg1 = TR::TreeEvaluator::stopUsingCopyRegAddr(node->getChild(0), srcPtrReg, cg);
    bool stopUsingCopyReg2 = TR::TreeEvaluator::stopUsingCopyRegAddr(node->getChild(1), dstPtrReg, cg);
    bool stopUsingCopyReg4 = TR::TreeEvaluator::stopUsingCopyRegInteger(node->getChild(3), termCharReg, cg);
    bool stopUsingCopyReg5 = TR::TreeEvaluator::stopUsingCopyRegInteger(node->getChild(4), lengthReg, cg);

Reference counts of all the child nodes are decremented.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally, I think you can only get away with that if they are constants. Are they?

Copy link
Contributor Author

@knn-k knn-k Nov 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, those dummy nodes are constants.
See an example arraytranslate node at #7499 (comment).

compiler/aarch64/codegen/OMRTreeEvaluator.cpp Show resolved Hide resolved
if (!disableTRTO255)
{
cg->setSupportsArrayTranslateTRTO255();
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think support for this opcode should be enabled in OMR when there isn't an implementation for it in OMR (the helper implementation is missing). In the absence of that, this logic should be enabled in the downstream projects that do provide the helper.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both p and x platforms do the same: Implement the helper functions in the downstream project, and enable the feature in OMR.
On the other hand, z inlines the vectorized code instead of calling helper functions.

I can move the implementation of the AArch64 helper functions to OMR. In that case, maybe we want to move the helper functions for p and x to OMR for consistency.

[p]

[x]

compiler/aarch64/codegen/OMRTreeEvaluator.cpp Show resolved Hide resolved
// (2) translation table (dummy)
// (3) stop character (terminal character, either 0xff00ff00 (ISO8859) or 0xff80ff80 (ASCII)
// (4) input length (in elements)
// (5) stopping char (dummy)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are the dummy nodes used by other architectures? Just wondering why they're dummy on AArch64.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is different between p/x/aarch64 and z. See the code below:

if (cg()->getSupportsArrayTranslateTRTO255()|| cg()->getSupportsArrayTranslateTRTO())
{
tableNode = TR::Node::create(callNode, TR::aconst, 0, 0);
if (isISO88591Encoder)
termchar = 0xff00ff00;
else
termchar = 0xff80ff80;
}
else //z
{
bool genSIMD = comp()->cg()->getSupportsVectorRegisters() && !comp()->getOption(TR_DisableSIMDArrayTranslate);
stopIndex = isISO88591Encoder ? 255: 127;
termchar = isISO88591Encoder ? 0x0B: 0xff;
if (genSIMD)
{
tableNode = TR::Node::create(callNode, TR::aconst, 0, 0); //dummy table node, it's not gonna be used
}
else
{
uint8_t *table = (uint8_t*)comp()->trMemory()->allocateMemory(65536, stackAlloc);
int i;
for (i = 0; i <= stopIndex; i++)
table[i] = (uint8_t)i;
for (i = stopIndex+1; i < 65536; i++)
table[i] = (uint8_t)termchar;
tableNode = createTableLoad(comp(), callNode, 16, 8, table, false);
stopIndex=-1;
}
}

The code for z from line 2543 generates a non-dummy node for child(2) at line 2562, while the non-z path generates a dummy aconst 0 at line 2537.
AArch64 takes the non-z path.

Copy link
Contributor Author

@knn-k knn-k Oct 31, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An example of the arraytranslate node for TRTO255:

 n819n    (  0)  treetop                                                                              [       0x12313efb0] bci=[0,13,180] rc=0 vc=2947 vn=- li=57 udi=- nc=1
 n818n    (  1)    arraytranslate  <arraytranslate>[#275  helper Method] [flags 0x400 0x0 ] (in GPR_0115) (char2byteXlate byteArrayXlate tableBackedByRawStorage )  [       0x12313ef60] bci=[0,13,180] rc=1 vc=2947 vn=- li=57 udi=13472 nc=6 flg=0xa020
 n795n    (  0)      aladd (in GPR_0112) (X>=0 internalPtr )                                          [       0x12313e830] bci=[0,12,180] rc=0 vc=2947 vn=- li=57 udi=12416 nc=2 flg=0x8100
 n460n    (  0)        ==>aRegLoad (in &GPR_0016)
 n797n    (  0)        lsub (in GPR_0017) (X>=0 cannotOverflow )                                      [       0x12313e8d0] bci=[0,12,180] rc=0 vc=2947 vn=- li=57 udi=29968 nc=2 flg=0x1100
 n798n    (  0)          lshl (in GPR_0017) (X>=0 )                                                   [       0x12313e920] bci=[0,12,180] rc=0 vc=2947 vn=- li=57 udi=29968 nc=2 flg=0x100
 n799n    (  0)            i2l (highWordZero X>=0 )                                                   [       0x12313e970] bci=[0,12,180] rc=0 vc=2947 vn=- li=57 udi=- nc=1 flg=0x4100
 n12n     (  0)              ==>iRegLoad (in GPR_0017) (cannotOverflow )
 n803n    (  0)            iconst 1 (Unsigned X!=0 X>=0 )                                             [       0x12313eab0] bci=[0,12,180] rc=0 vc=2947 vn=- li=57 udi=- nc=0 flg=0x4104
 n804n    (  0)          lconst -16 (X!=0 X<=0 )                                                      [       0x12313eb00] bci=[0,12,180] rc=0 vc=2947 vn=- li=57 udi=- nc=0 flg=0x204
 n805n    (  0)      aladd (in GPR_0113) (X>=0 internalPtr )                                          [       0x12313eb50] bci=[0,34,185] rc=0 vc=2947 vn=- li=57 udi=12960 nc=2 flg=0x8100
 n7n      (  0)        ==>newarray (in &GPR_0038) (X!=0 )
 n807n    (  0)        lconst 16 (highWordZero X!=0 X>=0 cannotOverflow )                             [       0x12313ebf0] bci=[0,34,185] rc=0 vc=2947 vn=- li=57 udi=- nc=0 flg=0x5104
 n1286n   (  0)      iconst 0 (X==0 X>=0 X<=0 )                                                       [       0x1234d81b0] bci=[-1,0,161] rc=0 vc=2947 vn=- li=- udi=- nc=0 flg=0x302
 n816n    (  0)      iconst 0xff00ff00 (X!=0 X<=0 )                                                   [       0x12313eec0] bci=[0,10,180] rc=0 vc=2947 vn=- li=57 udi=- nc=0 flg=0x204
 n815n    (  0)      i2l (in GPR_0114) (highWordZero X>=0 )                                           [       0x12313ee70] bci=[0,5,179] rc=0 vc=2947 vn=- li=57 udi=13216 nc=1 flg=0x4100
 n5n      (  1)        ==>iRegLoad (in GPR_0018) (cannotOverflow )
 n1287n   (  0)      iconst -1 (X!=0 X<=0 )                                                           [       0x1234d8200] bci=[-1,0,161] rc=0 vc=2947 vn=- li=- udi=- nc=0 flg=0x204

This commit implements arraytranslateTRTO255 for AArch64.

Signed-off-by: KONNO Kazuhiro <[email protected]>
@knn-k
Copy link
Contributor Author

knn-k commented Oct 31, 2024

Jenkins build aarch64,amac

@knn-k
Copy link
Contributor Author

knn-k commented Oct 31, 2024

I responded to all the review comments so far.

Copy link
Contributor

@0xdaryl 0xdaryl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assembly code looks reasonable to me.

compiler/aarch64/codegen/OMRTreeEvaluator.cpp Show resolved Hide resolved
TR::Register *inputReg = cg->gprClobberEvaluate(node->getChild(0));
TR::Register *outputReg = cg->gprClobberEvaluate(node->getChild(1));
TR::Register *inputLenReg = cg->gprClobberEvaluate(node->getChild(4));
TR::Register *outputLenReg = cg->allocateRegister();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally, I think you can only get away with that if they are constants. Are they?

@0xdaryl
Copy link
Contributor

0xdaryl commented Nov 6, 2024

Jenkins build alinux,amac

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants